[MLOB-7510] added session-level eval documentation by jennm · Pull Request #36958 · DataDog/documentation

jennm · 2026-05-22T18:02:53Z

What does this PR do? What is the motivation?

This PR adds documentation of session level evals

Merge instructions

Merge readiness:

Ready for merge

For Datadog employees:

Your branch name MUST follow the <name>/<description> convention and include the forward slash (/). Without this format, your pull request will not pass CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.

If your branch doesn't follow this format, rename it or create a new branch and PR.

[6/5/2025] Merge queue has been disabled on the documentation repo. If you have write access to the repo, the PR has been reviewed by a Documentation team member, and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #documentation channel in Slack.

AI assistance

Additional notes

github-actions · 2026-05-22T18:07:46Z

Preview links (active after the `build_preview` check completes)

New or renamed files

https://docs-staging.datadoghq.com/jenn/MLOB-7510/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/session_level_evaluations

Modified Files

https://docs-staging.datadoghq.com/jenn/MLOB-7510/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/trace_level_evaluations

jeff-morgan-dd · 2026-05-26T18:12:50Z

Created DOCS-14508 for editorial review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cswatt · 2026-05-28T18:17:31Z

Pushed some formatting changes, as well as a significant amount of section reorganization, most notably:

Moved "Configure a session-level evaluation" to the top. A lot of users do not need to be convinced to use this feature (perhaps they already have been convinced) and are coming to this page with the express purpose of finding out how to use it. This information needs to be above-the-fold. We cannot be making any sales argument in the documentation.
Removed "why session-level evaluations are needed": Contained marketing language, redundant information. Just the table is sufficient to make the point that this section was intended to make, so I preserved that and moved it to "Choosing the right scope" (renamed from "When to use session scope over trace or span scope")
Removed "What session-level evaluations are most useful for": More of convincing users to use this feature (nice to briefly mention in documentation, but really shouldn't be much of a focus)

cswatt

Please review the changes I've made, since the structure of the page has been significantly altered. If you absolutely need to add back any of the justification material I've removed, we can discuss!

Also noted a seeming discrepancy in the configuration instructions

rashel-ddog · 2026-05-29T17:26:39Z

+  text: "Tracking user sessions"
+---
+
+A session-level evaluation runs once per [user session][9], with every trace—and every span in those traces—available to the LLM judge in a single prompt. Sessions group related interactions under a shared `session_id` (for example, a chat conversation) and can include multiple traces over an extended interaction.


he opening leads with mechanics ("runs once per user session") before the reader knows what the feature does for them.

Suggestion - something like
A session-level evaluation runs a custom LLM-as-a-judge across an entire [user session],
every trace, and every span in those traces, in a single prompt. Use it to score things that
only make sense across a whole interaction: whether the user's goal was met, whether the
assistant stayed coherent across turns, or whether a user grew frustrated over time.

Sessions group related interactions under a shared session_id (for example, a chat
conversation) and can span multiple traces. Session scope sees context that trace- and
span-level judges cannot, because those judges only see a single request or span.

rashel-ddog · 2026-05-29T17:27:52Z

+
+   {{< img src="llm_observability/evaluations/session_level_evaluation_scope.png" alt="The Evaluate On scope picker with Session selected." style="width:100%;" >}}
+
+   <div class="alert alert-info">A session is considered complete after 30 minutes of inactivity (no new spans for that session, measured from the most recent span), at which point the evaluation runs. Spans that arrive more than 30 minutes after the previous span are not included in the evaluation.</div>


We dont need this, there is a whole section for it.

rashel-ddog · 2026-05-29T17:29:34Z

+
+1. Pick a sample session from the panel on the right. The pane lists the traces in that session, with the fields referenced by your prompt highlighted.
+
+   {{< img src="llm_observability/evaluations/session_level_sample_session.png" alt="The configuration page in session scope, with the sample session pane on the right showing traces and highlighted span fields." style="width:100%;" >}}


suggest to remove. The image below is enough

rashel-ddog · 2026-05-29T17:29:46Z

+
+   {{< img src="llm_observability/evaluations/session_level_sample_session.png" alt="The configuration page in session scope, with the sample session pane on the right showing traces and highlighted span fields." style="width:100%;" >}}
+
+   Clicking on a session then lists the traces in that session, with the fields referenced by your prompt highlighted.


Suggest to remove this text.

rashel-ddog · 2026-05-29T19:35:14Z

Also -

I think we need to update this to include session syntax: https://docs.datadoghq.com/llm_observability/evaluations/custom_llm_as_a_judge_evaluations/prompt_templating
I would also update ordering to start with sessions, them Trace, then Spans. So it's kind hierarchal

added initial text for session documentation

0eaa11f

github-actions Bot added the Architecture Everything related to the Doc backend label May 22, 2026

updated some of the screenshots

b806ac3

github-actions Bot added the Images Images are added/removed with this PR label May 22, 2026

jennm added 2 commits May 26, 2026 11:02

added all screen shots

e64a3fa

removed some things that are not currently relevant

5c080cc

jennm marked this pull request as ready for review May 26, 2026 15:53

jennm requested a review from a team as a code owner May 26, 2026 15:53

jeff-morgan-dd self-assigned this May 26, 2026

jeff-morgan-dd added the editorial review Waiting on a more in-depth review label May 26, 2026

jeff-morgan-dd removed their assignment May 26, 2026

updated evaluation scope image

9397953

updated evaluation scope image for trace level evals

e7c39a6